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ABSTRACT 

We show that extreme value statistics are useful for studying the largest structures 
in the Universe by using them to assess the significance of two of the most dramatic 
structures in the local Universe - the Shapley supercluster and the Sloan Great Wall. 
If we assume that the Shapley concentration (volume « 1.2 x 10 5 /i~ 3 Mpc 3 ) evolved 
from an overdense region in the initial Gaussian fluctuation field, with currently popu- 
lar choices for the background cosmological model and the shape and amplitude erg of 
the initial power spectrum, we estimate that the total mass of the system is within 20 
percent of 1.8 x 10 16 /i _1 M . Extreme value statistics show that the existence of this 
massive concentration is not unexpected if the initial fluctuation field was Gaussian, 
provided there are no other similar objects within a sphere of radius 200/i~ 1 Mpc cen- 
tred on our Galaxy. However, a similar analysis of the Sloan Great Wall, a more distant 
[z ~ 0.08) and extended concentration of structures (volume ps 7.2 x 10 5 ft.~ 3 Mpc 3 ) 
suggests that it is more unusual. We estimate its total mass to be within 20 percent of 
1.2 x 10 17 /i _1 M Q and we find that even if it is the densest such object of its volume 
within z — 0.2, its existence is difficult to reconcile with the assumption of Gaussian 
initial conditions if as was less than 0.9. This tension can be alleviated if this struc- 
ture is the densest within the Hubble volume. Finally, we show how extreme value 
statistics can be used to address the question of how likely it is that an object like the 
Shapley Supercluster exists in the same volume which contains the Sloan Great Wall, 
finding, again, that Shapley is not particularly unusual. Since it is straightforward to 
incorporate other models of the initial fluctuation field into our formalism, we expect 
our approach will allow observations of the largest structures - clusters, superclusters 
and voids - to provide relevant constraints on the nature of the primordial fluctuation 
field. 

Key words: methods: analytical - dark matter - large scale structure of the universe 
- galaxies: clusters: general 



INTRODUCTION 



Since its discovery (|Shaplev 193Cf ) the Shapley Super- 
cluster has been the object of considerable interest be- 
cause it potentially contributes sig nificantly to the veloc- 
ity field in the local Univ erse (e.g. IScaramella et al. 19891 ; 
iRavchaudhurv et al. 199~lh and because the existence of ex- 
tremely massive objects such as Shapley constrains the am- 
plitude of the initial fluctuation field, and possibly the hy- 
pothesis that this field was Gaussian. 
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Recent studies suggest that the Shapley Supercluster 
contains a few times 10 16 /i -1 Mq, is overdense by a factor 
of order 2, and is receding from us at about 15,000 km s _1 . 
These concl usions are based on studies of the motions 
of galaxies (iQuintana et al. 20od: iReisenegger et al. 200d : 



iProust et al. 20061 ; Ragone et al. 2006T ) and estimates of the 
masses of X-ray cluste rs in this region ( Rci prich et al. 20021 : 
Ide Filippis et al. 2005l h In addition, the fact that this re- 
gion is over-abund ant in rich clusters a lso allows an esti- 
mate of its mass l|Mufioz fc Loeb 2008 ), not all of which 
may actually be bound to the system ( Diinner et al. 20071 : 
lArava-Melo et al. 20081 ). Whereas the other methods are ob- 
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servationally grounded, the mass estimate from this last 
method (i.e. from the over-abundance of rich clusters) fol- 
lows from the assumption that the initial fluctuation field 
was Gaussian. Here, we refine this estimate of the total mass 
of Shapley and compare it with the answer to the ques- 
tion: What is the probability distribution of the mass of 
the most massive object, having the volume of Shapley, if 
it formed from Gaussian initial conditions? We use extreme 
value statistics to address this question. Although we do 
not explore this here, we note that our methods are easily 
extended to incorporate non-Gaussian initial conditions. 

Section [5] summarizes a number of properties of the 
Shapley supercluster. Sections [3] and U describe our meth- 
ods based on the excursion set approach and extreme value 
statistics, and what they imply for objects like Shapley, 
for which accurate estimates of the masses of the con- 
stituent clusters are available. Section [5] shows how to 
extend these app roaches to study the Sloan Great Wall 
(|Gott et al. 20051 ). for which accurate mass estimates of 
the components are not available. This req uires combining 
a halo model (e.g., ICoorav fc Sheth 20021 ) analysis of the 
galaxy population with a catalog of groups identified in this 
distribution. For the SDSS, we use the clustering and group 
analyses of Zehavi et al. (2005) and Berlind et al. (2006), 
respectively. 

A final section summarizes our results, shows how ex- 
treme value statistics can be used to answer the question 
of how unusual it is that an object like the Shapley Super- 
cluster exists in the same volume which contains the Sloan 
Great Wall, and discusses how our methods allow observa- 
tions of the largest structures - clusters, superclusters and 
voids - to place interesting constraints on the nature of the 
initial fluctuation field. Where necessary we assume a flat 
ACDM model with (fi , fit, h, <r 8 ) = (0.27,0.046,0.72,0.8), 
but we also explore other choices of erg. 



2 THE SHAPLEY SUPERCLUSTER 

The largest redshift survey which includes the Shap- 
ley supercluster sugge sts that it contains 8632 galax- 
ies (|Proust et al. 200q ). These have been grouped into 
122 systems of g alaxies with 4 or more members 
jRagone et al. 20061 ). We run a percolation algorithm on this 
catalog to identify the largest supercluster in this region. 
To do so, we neglect the peculiar velocity of the clusters: 
i.e., each cluster is assigned coordinates xi = r cos <5 cos a, 
X2 = r cos 8 sin a and x$ — rsin<5, where (a, S) are its ce- 
lestial coordinates and r = cz/Hq. Figure [T] shows the pie 
diagram of these systems. Solid dots show the 40 systems be- 
longing to the Shapley Supercluster when we use a linking 
length of 8h~ x Mpc. According to the virial masses com- 
puted by Ragone et al., 15 of these 40 clusters have masses 
larger than 10 14 /i _1 M©. Summing the masses of these 40 
clusters yields 5.42 x 10 15 /i _1 M Q . The total mass is ex- 
pected to be considerably larger than this, because lower 
mass groups and galaxies are expected to contribute signifi- 
cantly to the total. Ragone et al. (2006) use mock catalogs, 
based on the VLS simulation of Yoshida et al. (2001), to 
account for this missing mass, and conclude that the total 
mass of Shapley is likely to be about 1.6 x 10 16 Mq. 

To quantify the shape of the Shapley supercluster, we 
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Figure 1. Systems of galaxies of the l lRagone et al. 20061) sam- 
ple in redshift space. Solid dots are the clusters belonging to the 
Shapley supecluster according to a percolation analysis with per- 
colation length 8h~ 1 Mpc. 



compute the eigenvalues of the inertia tensor 



Us = 



J2 k mkXkiXkj 



where i, j — 1, 2, 3 



(1) 



where m*, is the mass of each cluster, the coordinates x 
are centered on A3558, and the sum is only over the clus- 
ter members. We find the three eigenvalues 8.30, 5.48, and 
2.73/i -1 Mpc. If we neglect the fact that the 40 cluster mem- 
bers have masses in the range [0.008,6.717] x 10 14 /i _1 M , 
and set = 1 for all k's, the eigenvalues of the tensor of 
inertia are 7.69, 6.02, and 3.42/i -1 Mpc; i.e., they are not 
substantially different from the previous values. 

As a check, we have also applied our percolation analysis 
to an X- ray survey of this regio n, which shows 41 extended 
sources l|de Filippis et al. 20051 ). A link length of 8/i _1 Mpc 
links 8 clusters, and returns a total mass in X-ray clusters 
of 1.65 x 10 15 /i _1 M , where we estimated the mass of each 
cluster as follows: 



M200 
hi} Ma 



\W A + i0 h7 2 



erg s" 



I/O 



(2) 



where A = -22 1 ± 1.3 and a = 1.807 ± 0.084 
l|Reiprich et al. 20021 ) Fl With this recipe, only 5 out of 



1 This differs slightly from Munoz 
that M200 oc L 1 / 1,6 . 



Loeb (2008), who assume 
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the 8 members have masses larger than 10 h Mq. 
It is reassuring that these numbers are smaller than 
those of Ragone et al. (2006), because this sample of X- 
ray clusters with kno wn redshifts is clearly incomplete 
|de Filippis et al. 20051 ). Therefore, in what follows, we use 
the cluster catalogue from Ragone et al. (2006) , rather than 
from the X-ray data. 



3 THE EXCURSION SET APPROACH 

The previous section suggests that the total mass of the 
Shapley supercluster is at least 5 x lO 15 ft _1 M0. In this 
section, we make a rather different estimate of the to- 
tal mass. According to Ragone et al. (2006), the inner 
31/i _1 Mpc of Shapley centered on A3558 contains 58 galaxy 
systems: 19 of these have mass greater than 10 14 /i -1 Mq. 
For such high masses, it is reasonable to equate each clus- 
ter with a single halo. Integrating the halo mass function 
l|Sheth fc Tormen 19991 ) from this lower limit to infinity 
shows that the expected number in randomly placed spheres 
of this radius is only 2.67. This number depends on as: re- 
ducing as to 0.7 changes the expected count to 1.77; increas- 
ing to 0.9 makes the count 3.5. Neither of these numbers is 
close to that observed. 

However, if Shapley is an overdense region, then the 
relevant comparison is not with the expected counts in 
a region of average density, but one which is overdense 
IIMunoz fc Loeb 2008). In theories of structure formation 
from Gaussian initial conditions, massive halos are expected 
to be more abundant in dense regions, and the mix of halos is 
expected to also be different. In dense regi ons, the halo mass 
function is expected to be top-heavy dFrenk et al. 19*881 ; 
iMo fc White 19961 ; Isheth fc Tormen 20021 ). so this is an im- 
me diate signal that Shapl ey must be overdense in dark mat- 
ter l|Munoz fc Loeb 20081 ) . Measurements in the SDSS indi- 
cate that the halo mass function in regi ons which are over- 
dense in galaxies is indeed top-heavy l|Skibba et al. 20061 ; 
I Abbas fc Sheth 20071 ). so it is interesting to ask if this effect 
is sufficient to explain the existence of a region like Shapley. 

To make this estimate, we will make the crude assump- 
tion that Shapley is spherical, despite the fact that it is not, 
as we have shown in the previous section. However, by con- 
sidering the most massive 19 clusters within a distance of 
31/i~ 1 Mpc from A3558, rather than the system identified 
with the percolation analysis, we expect to make this as- 
sumption more reasonable. We will return to the issue of 
triaxiality in the final Discussion section. 

Let Ns denote the mean number of halos with mass 
above threshold in a region which has volume V and contains 
mass M (so the mass overdensity is 1 + 5 — M/pV): 

N s = dmN{m,S c \M,V). (3) 



This number increases as M increases; the precise depen- 
dence can be computed following arguments in Sheth & Tor- 
men (2002), which build on the work of Mo & White (1996), 
and are within the framework of the ex cursion set approach 
l|Lacev fc Cole 19931 ; iBond et al. 199ll )Fl This approach re- 
quires an estimate of the relation between the overdensity in 

2 Note that the procedure followed by Munoz & Loeb (2008) for 
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Figure 2. Expected number of clusters with masses greater than 
W 14 h~ 1 Mq as a function of the total mass of the supercluster. 
The expected number increases as erg increases. 



linear theory, <5l and the actual nonlinear overdensity 1 + S. 
We use the spherical model to do this: 



1 + 5: 



1- ^ 



(4) 



where <5 SC « 1.675. 

Let p(M\V) denote the probability that a randomly 
placed cell of size V contains mass M. If we assume that 
halo counts in cells of mass M follow a Poi sson distribution 
with mean Ns (see Sh eth fc Lemson 19991 . for why this is 
only accurate for large cells), then the probability that a 
cell of size V, in which there are N clusters, contains mass 
M is 



p(M\N,V) 



p(N\M, V)p(M\V) 
p(N\V) 



where 



and the Poisson assumption means 



p(N\V) = / dMp(N\M,V)p(M\V), 



p(N\M,V) 



N\ 



N 



exp(-N s ) 



(5) 



(6) 



(7) 



To proceed, we require a model for the probability p(M\V) 
that a randomly placed cell of size V contains mass M. 
Now, p(M\ V) can be estimated using the same excursion se t 
framework as is used in the calculation of Ns JSheth 1 998). 



estimating Ns will yield large-scale halo bias factors which are 
the same a s those of Mo fc White (1996); these are known to be 
inaccurate JSheth fc Tormen 19991) . Our procedure produces bias 
factors which are in substantially better agreement with simula- 
tions. 
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Alternatively, on large sc ales, it could also be est imated us- 
ing perturbation theory (Bcrnarde au et al, 2 002). On these 
large scales, these two approaches are in good agreement: 
the shape of p(M\V) which results is reasonabl y well ap- 
proximated by a Lognormal (|Lam fc Sheth 2 008): 



p(M\V)dM : 



exp(-y 2 /2^) dM 



(8) 



a L M ' 

where y — ln(l + 8) + <7 L /2, and <tl is the variance 
in linear theory on scale V. For as — (0.7,0.8,0.9) and 
V — (47r/3)31 3 /i -3 Mpc 3 our linear power spectrum yields 
ox = (0.23,0.26,0.29). 

Figure [5] shows how Ns , computed following Sheth & 
Tormen (2002), increases with total mass M for our three 
choices of as- This, in equation ((SJ), allows us to con- 
strain the expected values of M. The solid curve in Fig- 
ure [3] shows p(Ad\N, V) when as = 0.8. Figure [4] shows 
p(M\N, V) for era = 0.7 (top) and cr 8 = 0.9 (bottom). 
In effect, these are estimates of the total mass, and hence 
overdensity, of Shapley. Notice that these distributions shift 
slightly with as- The sense of the trend is easily under- 
stood: When as is small then massive halos are rare, so 
the environment must be that much more extreme to pro- 
duce the observed number of clusters. At the peak val- 
ues log(M//i _1 M ) = (16.28,16.26,16.25) the associated 
overdensities are (1 + 8) = (2.07, 1.99, 1.93) so the lin- 
ear theory overdensities are <5l = (0.60,0.56,0.54), making 
(5 l /ctl) = (2.60,2.15, 1.86). These indicate that Shapley is 
not particularly unusual[f| We argue in Section l4~T1 that to es- 
timate the initial 'peak height', it may be more appropriate 
to use <tl(M) rather than a^pV). This yields higher values: 
5l/o"l = (3.35,2.75,2.33). All these results are summarized 
in Table [T] It is remarkable that our analytic estimate of 
the total mass is so similar to that derived by Ragone et al. 
(2006) using mock catalogs: for as ~ 0.9 (the value in their 
mocks), our estimate is only 10% larger than theirs. 

Upon evaluating an integral that is very similar to the 
one which defines Ns, the excursion set approach also yields 
estimates of the typical mass fractions in such clusters. If we 
use fg to denote this fraction, then 

r-AI 

fs= dmN(m,S B \M,V)(m/M). (9) 

J A-fmin 

At the peak values shown in the Figures, fg = (0.14, 0.18, 
0.22) for as = (0.7,0.8,0.9). Since the total observed mass 
in these 19 clusters is 5.27 x 10 15 hT 1 Mq, these mass frac- 
tions suggest total Shapley masses of \og(M/h~ 1 Mq) = 
(16.57, 16.47, 16.38). These values are larger than the peak 
values from the excursion set approach, because the ex- 
pression above assumes that the observed number of clus- 
ters is equal to Ns, whereas it is actually larger by a fac- 
tor of (1.7, 1.6, 1.5). Increasing fs by these factors reduces 
the estimated total Shapley mass to \og(AI/h~ 1 MQ) = 
(16.34, 16.26, 16.20). These values are in excellent agreement 
with our estimate above, which was based on the fact that 
19 massive clusters were observed, but no other information 
about their masses was used, though the agreement is best 
for as = 0.8. 

3 For as = 0.8, our estimate of <5l/<tl i s close to that of Munoz & 
Loeb (2008); our estimates of the total mass differ because they 
used a substantially larger volume estimate than do we. 
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Figure 3. Comparison of the excursion set estimate of the mass 
of the Shapley supercluster (solid) with the expected mass of the 
densest of N = (200/31) 3 and the sixth densest of N = (575/31) 3 
randomly placed cells having the same volume as Shapley (dashed 
and dotted), when as = 0.8. 



4 EXTREME VALUE STATISTICS 

It is interesting to compare the mass estimates derived above 
with the mass associated with the densest of N randomly 
placed cells, where iV is the ratio of Shapley's volume to that 
in which it was found. If the masses agree, then this would 
suggest that although Shapley is extreme, it is not unusu- 
ally so. Note that, despite the similarity, this is a different 
question from the one which is more often asked: Is the re- 
gion containing Shapley the densest of its size in the entire 
sphere centered on our galaxy which contains Shapley? 

Given a total survey volume, the mass of the densest of 
iV cells placed randomly in this volume (i.e., large compared 
to the cells) - which we will estimate below - is certainly 
smaller than the mass associated with the question that is 
more usually asked. This is because one might think of this 
densest region as a particularly carefully placed cell. In par- 
ticular, one would have to throw a large number of cells 
(compared to N) before one lands in just the right position 
to find this densest region. We discuss the difference between 
these two extreme value estimates in Section ^. 31 Of course, 
both require an assumption about the volume within which 
Shapley was found. We will assume that this is a sphere with 
radius 200/i _1 Mpc, and will discuss how our results depend 
on this choice shortly (e.g. following equation 1 12[) . 

If Pi(< M|V) denotes the probability that the most 
massive of the N = (200/31) 3 regions of volume V = 
Vshapiey that are within 200ft~ 1 Mpc is less massive than M, 
then Pi (< M\ V) must equal the probability that each of the 
iV ss 270 cells is less massive than M. Thus 



Pi(< M\V) 



dMpi(M\V) S3 P (< M\Vy 



(10) 
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Figure 4. Dependence of the estimated mass of the Shapley su- 
percluster (solid) and of the densest of (200/31) 3 randomly placed 
Shapley-sized cells (dashed) on <rg. 



and, by taking the derivative, 

Vi(M\V) » Np(M\V)p{< M\V) N -\ (11) 

Appendix [Al discusses this approximation further. 

Before we use this expression, notice that if M\i% de- 
notes the median value of the expected mass, i.e., that at 
which Pi(< My 2 \V) = 1/2, then 



ln(2) 
N 



ln[l -p(> M 1/2 \V)] » -p(> M 1/2 |F), (12) 



the distribution. This shows that the mass returned by our 
approach is approximately the same as that given by set- 
ting Np(> M\V) — 1 (because In 2 is of order unity), which 
makes intuitive sense. It also illustrates that the mass esti- 
mate depends on N: If the large M tail falls exponentially, 
then Mr/2 °t ln(N/ ln(2)). I.e., the expected mass increases 
approximately as ln(AT), so the dependence on N, and hence 
on our assumption that V is the comoving volume within 
200/i~ 1 Mpc, is weak. 

This means that one can devise a test which asks if 
the survey volume which is required to make a certain mass 
object the densest of its type does indeed contain only one 
such object. Alternatively, if the survey volume is known 
but the mass is not, then the assumption that the object is 
the most massive actually yields an estimate of its mass. We 
will show shortly that Shapley passes either of these tests 
for currently acceptable values of erg. 

Finally, we note that the mass estimate can be rather 
precise. If we use Mo. 84 to denote the value of the mass 
below which 84% of the probability lies, namely the value 
at +lcr, then, for an exponentially falling distribution in M, 
Mo.84 oc ln(A/ln(l/0.84)), so 



Mo.84 = 1 | ln(ln(2)/ln(l/0.84)) = 



1.38 



M 1 



/2 



ln(A/ln(2)) 



(13) 



ln(AT/m(2)) 

is 0.19, and it 



For N = 1000 the fractional error on Mr/ 2 
decreases as ln(AT) increases. 



4.1 Extremes in the initial conditions 

To illustrate the approach, suppose that the pdf associ- 
ated with scale V is a Gaussian with variance ctl- Then the 
extreme- value mass and survey volume are related, through 
equation (fl2jl . by 



erfc 



21n(2) 

s/2) Vsurvey/V" 



(14) 



where <5l is related to M/V by equation Q. The previous 
section argued that, if as = 0.8, then, for an object like 
Shapley, ox = 0.26 and <5l/ol = 2.15. These values in equa- 
tion (|14[) imply Vsuivey/V ~ 44. Since this is substantially 
smaller than 270, there should be at least 6 other Shapley- 
like objects within 200/i _1 Mpc of us. This is unlikely. Alter- 
natively, requiring Vsurvcy/V^ = 270 means = 2.8. For 
<tl = 0.26, the associated nonlinear overdensity is 1 + 5 = 2.6 
making the estimated mass 

10 16.42 /i - 1Mq Thig ig about 

0.18 dex larger than that from the excursion set approach, 
indicating that although Shapley is a rich concentration, it 
is not more extreme than one would expect on the basis of 
random statistics. Therefore, it would not be unexpected to 
find an even more extreme object of its volume in the local 
universe. 

One can improve on these estimates by noting that if 
one is using the linear pdf, then the appropriate smoothing 
scale is not V but the associated initial scale M/p, and <tl 
shoul d also be computed on the scale M/p rather than V 
(e.g.. iLam fc Sheth 20081 ). Since ctl is smaller than before, 
<5l/o"l will be larger, and we now require 



where we have assumed that p(> M\V) <C 1 in the tail of 



erfc 



ol( 



21n(2) 



(M)V2j ^Survoy/^Cl + S) ' 



(15) 
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Table 1. Estimated initial fluctuation height and mass of the 
Shapley supercluster. The values listed in columns 2 and 3 show 
that this large concentration of galaxies is not unlikely. 



The result is that Vsurvey/V" « 232(1 + 5) = 464. Thus, Shap- 
ley is consistent with being the densest of (200/31) 3 cells, 
so we should not be surprised if we find another comparable 
or even more massive object in a survey that is only slightly 
deeper. Alternatively, if we set Vswvey/V = 270, then equa- 
tion d]) requires Shapley's mass to be 10 16 ' 245 h~ 1 M Q , 
which is in good agreement with the excursion set analy- 
sis. 

4.2 Extremes in the nonlinear field 

It is interesting to contrast this treatment, which uses ex- 
treme value statistics of the initial pdf, with an analysis 
based on the nonlinear pdf. In the previous section, we used 
the fact that the Lognormal distribution (equation [8} is a 
reasonably accurate model. In this case, the distribution of 
ln(M) is Gaussian, so the previous analysis goes through 
except that now 



erfc 



\n(M/pV) + o'i/2 



21n(2) 



(16) 



OX\/2 / Vsurvey/V 

The associated estimate for Vsurvcy/V = N given the excur- 
sion set mass of lO 16 26 ft" 1 M and a L = 0.26 is 270. The 
small differences compared to the previous estimates can be 
understood as deriving from the fact that the term in brack- 
ets in the erfc above effectively makes Shapley a fluctuation 
of height 2.79 (for a s = 0.8). 

In fact, the distribution of the expected mass is skewed. 
Hence, to provide a more direct comparison with the mass 
estimates from the previous section, which we also expressed 
as distributions, the dashed curves in Figures [3] and 2] 
show equation (JTTJ) for the same Lognormal distributions of 
p(M\V) that we used in the excursion set calculation. The 
overlap between the solid and dashed curves is remarkable, 
given how very different these two methods are. E.g., for this 
calculation, the most probable mass M decreases as erg de- 
creases (dashed curves in Figure UJ, because small values of 
as mean that large deviations from the mean value are rarer; 
this trend is opposite to that for the excursion set approach, 
where small values of as mean massive halos are rarer, so 
the total mass M from which to obtain the observed number 
of massive halos must be larger. So it is interesting that the 
match between these two approaches is slightly better for 
as = 0.8 than for the other two cases. When as = 0.8, then 
Shapley is consistent with being the most massive of a ran- 
dom set of regions of volume Vshapicy in the local Universe; 
if as = 0.9, then Shapley lies at the low-end of the expected 
extreme-mass distribution; if as = 0.7, then it lies at the 
high- mass end. 

These curves show that, if it is the most extreme object 
within 200/i~ 1 Mpc, then the existence of Shapley is easily 



accomodated in models with high as; even as = 0.7 is not 
problematic. On the other hand, if as = 0.9, then, we will 
not have to increase the survey volume much before we see 
another object that is more extreme than Shapley. However, 
if <t 8 = 0.7, then Shapley should be the most extreme object 
even in a volume that is larger by a factor of 2. It happens 
that there is indeed a very large structure in the volume 
which lies just beyond Shapley. The next section studies 
this structure in more detail. 

But before we do, it is worth noting that our extreme 
value mass estimate is rather precise: the widths of the 
dashed curves in Figures [3] and [3] are typically less than 
0.1 dex. While this level of precision may be surprising, we 
note that its origin is understood: setting N = 270 in equa- 
tion (|13p yields a fractional uncertainty of 0.23, which cor- 
responds to 0.1 dex. 

4.3 Peaks and extremes 

So far, the extremes we have been considering are associ- 
ated with the statistics of randomly placed cells. However, 
we noted that we are often more interested in ascertaining 
whether or not a particular object is an extreme outlier - 
since we have determined the location and size of the ob- 
ject a priori, treating it as a randomly placed cell is no 
longer appropriate. At least for sufficiently overdense ex- 
tremes, there is a relatively straightforward way to account 
for this difference. This is because sufficiently overdense ob- 
jects in the nonlinear density field typically correspond to 
large fluctuations in the initial field: i.e., vl = <5l/<tl ^ !• 
For such objects, it should be a good approximation to as- 
sume they formed from high peaks in the initial field (also 
see discussion in Colombi et al. 2011). The expected number 
density of peaks above some v t (which we would like to es- 
timate) is related to the probability that a randomly placed 
cell lies above this same threshold as follows. Typically, one 
can move the cell which defined the peak around a little bit 
without significantly changing the height of the fluctuation 
in it. If we think of this as defining a volume around each 
peak, then 

erfc(i/ t / v / 2) 



vol(> v t )n pk (> u t ). (17) 



If the peak was associated with smoothing scale Rm, then 
this volume satisfies 



vol(> vt) = 



as v t 



(18) 



(jRm /R«) 3 (v 3 -vt) 

l|Bardeen et al. 1986J ). This shows that the volume scales ap- 
proximately as v^ 3 , with prefactors that can be understood 
as follows. The volume of a Gaussian smoothing filter is 
(27r) 3 ' 2 R 3 , so the numerator is the moral equivalent of what 
we have been calling the volume of the randomly placed cell 
in the initial conditions: V(l + S). This means that 

Vs , _ n- P k(> ft) Vs urvcy 



Vt 



(19) 



V(1 + S)^'-^ J (jRm/R,) 3 (v 3 

If we now replace the requirement that [Vs U rvoy/V(l + 
8)]P(> vt) = 1 with the requirement that n p k(> 
Vt) Vsurvcy = 1 (see equation [12] and below), then this means 
that we now want 



Vsur 



V(l + S) 



^R M /R t ) 3 (v 3 -v t )- 



(20) 
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Figure 5. Dependence of the extreme value estimate of the height 
of the highest peak on the ratio of survey to peak volumes. 



Comparison with equation (|15[1 shows that the required 
Vsurvey is reduced by a factor proportional to (u 3 — v t ). For 
scale-free spectra, ("fRta / R*) 3 = [(n + 3)/6] 3 ^ 2 , and, for the 
large smoothing scales of interest here (~ 30/i~ 1 Mpc), we 
can think of a ACDM model as having n between and — 1. 
This makes the required Vsurvey smaller by a factor of ap- 
proximately 2 3/2 /^ 3 or 3 3/2 /^ 3 . Alternatively, if Vsurvey/V 
is fixed, then the associated value of v%, and hence the asso- 
ciated mass estimate, will be larger than before. Although 
the relation between the value from the peaks calculation 
and that for random cells depends on Vt, at V\ ~ 5 (the 
high peaks of most interest here), the peaks calculation re- 
turns approximately 1 plus the value from the random cells 
calculation. 

We can combine extreme value and peak statistics to 
make a slightly more detailed statement. Namely, for a given 
ratio of survey to peak volume, what is the expected distri- 
bution of the height of the highest peak? The same logic 
which led to equations (|10|) and pip implies that 



PlpkM ~ n p k(u) Vsurvey exp[— n pk (> v)Vs 



urvcyj 



(21) 



(The Appendix discusses how one might go beyond the Pois 
son/independent cells assumption.) Figure shows this dis 
tribution for a number of choices of 

^Survey 



Ac, 



(jR pk /R*) 



(27T)3/^3 k - 



(22) 



To make the plot, we have used the v 1 approximation 
(4.14) of Bardeen et al. (1986) rather than the full expression 
for n p k(^), since we only expect this analysis to be valid for 
v 3> 1. But this does not affect the main point we wish to 
make: that the height of the highest peak is only a weak 
function of 7V e ff- This is the analogue of the statement we 
made previously about the weak dependence of My 2 on N. 
The lesson is that very large survey volumes are required to 
reach large values of v. 

Note in particular, that this analysis is only valid for 



v larger than the one given by the excursion set analysis of 
Shapley, so we will not make numerical estimates of these ef- 
fects here. However, in the next section, we will be interested 
in larger v, and this analysis will then be useful. 



5 THE SDSS GREAT WALL 

A dramatic structure at z ~ 0.08 is seen in the 2dF and 
SDSS galaxy surv eys. Now known as the Sloan Great Wall 
|Gott et al. 20051) . it is, like Shapley, a region containing an 
overabundance of rich clusters. We would like to perform a 
similar exercise to determine if it too can be easily accomo- 
dated in Gaussian theories. However, in this case, we do not 
yet have mass estimates of its members, and the appropriate 
lower limit in equation © is unknown. Therefore, we have 
extended our approach as follows. 



5.1 Percolation estimates of Wall volume 

We begin with the SDSS per colation catalog of groups in 
the SDSS |Berlind et al. 2 006). This provides a list of about 
4100 groups having three or more members brighter than 
M r — —19.9. We perform our own percolation analysis on 
this group catalog to identify the members of the Great 
Wall. The size of the Wall depends on the parameters of 
our percolation analysis; we have found that a link-length 
of 8/i -1 Mpc returns a catalog that closely corresponds to 
the contiguous structure picked out by eye. This is approx- 
imately given by 0.07 < z < 0.092 and < dec < 6 if 
185 < ra < 210 and 0.07 < z < 0.080 and < dec < 6 
if 166 < ra < 185. The underlying group catalog and the 
Great Wall members identified by our analysis are shown as 
dots and filled circles in Figure [6] The Wall defined in this 
way contains 2180 galaxies in 335 groups. It has a volume 
of approximately 2.3 x 10 5 fo _3 Mpc 3 , so its effective radius 
is about 38/i _1 Mpc; <tl = 0.212 (as/0.8) on this scale. 

We note that the Wall appears to extend beyond the 
SDSS footprint towards negative declination. Because this 
cut reduces our estimates of both the number of group mem- 
bers and the total volume, neither our excursion set nor our 
extreme value analyses are strongly affected by this cut. 

Our estimate of the total volume is determined from 
redshift-space quantities. For a structure as large as this, the 
redshift-space volume is smaller than the real-space volume. 
Figure [5] suggests that, along the line of sight, the structure 
varies from about 5000 km s" 1 to about 2000 km s" 1 . If 
we assume that line-of-sight velocities are unlikely to exceed 
1000 km s _1 , then the true structure may be larger in the 
redshift direction by a factor of between 1.2 and 1.5. Hence, 
we may have underestimated the true volume of the Wall 
by this same factor. In Section 15.31 we will show that our 
conclusions about whether or not the Wall is unexpected are 
not very sensitive to this uncertainty. 

On the other hand, our choice of link-length makes the 
Wall significantly smaller in extent than claimed by Gott et 
al. Indeed, our estimate of the Wall's volume makes it only 
(38/31) 3 = 1.8 times larger than Shapley. A link-length of 
about 12/i _1 Mpc is required to get something approaching 
their definition (open circles). In this case, the total vol- 
ume is about 7.2x 10 5 /i" 3 Mpc 3 (effective radius 55/i _1 Mpc), 
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Figure 6. The Great Wall in the SDSS (filled circles), identified by a percolation analysis of the SDSS percolation group catalog (dots). 
Open circles show the additional members which are included if the percolation link length is increased from 8h~ x Mpc to 12h~ x Mpc. 



<tl = 0.139 ((J8/0.8), and the structure contains 3663 galax- 
ies in 645 groups. Again, varying the total volume by ~ 30% 
makes little difference to the nature of our conclusions be- 
low. More importantly, we will show that although our es- 
timates of the mass in the Wall do depend strongly on the 
link-length used to define the Wall (the longer link-length 
yields a Wall with three times the volume, so one naively ex- 
pects the mass to be about three times larger as well), our 
conclusions about how unusual the Wall is do not depend 
strongly on this choice. 

5.2 A halo model-excursion set estimate of the 
Wall mass 

A halo model analysis of the underlying galaxy catalog (i.e., 
SDSS galaxies with M r < —19.9) suggests that only halos 
above M m i n = 1O 12 M0 host such galaxies. In halos of mass 
m which host such galaxies, the probability of hosting N s 
additional galaxies (with M r < —19.9) is given by a Poisson 
distribution with mean 

^K*^r (23) 

l|Zehavi et al. 2 005). To an excellent approximation, this re- 
lation between the galaxy population and halo mass is in- 
dependent of environment (| Abbas fc Sheth 20071 ). This is a 
key point, because it means that the relation above is ex- 
pected to be as accurate for the halos in the Sloan Great 



Wall as elsewhere. Moreover, this assumption has also been 
shown to accurately reproduce the properties of the galax- 
ies in the percola tion group catalog we are using here 
l|Skibbaet al. 20071) . 

In the present context, the accuracy of the halo model 
decomposition, and of the Poisson distribution of N s in par- 
ticular, means that we expect the fraction of halos of mass 
m which host 3 or more galaxies to be 

f 3 (m) = l-e- (iVs|m> (l + <iV s |m». (24) 

Similar expressions for fn{m) can be defined for arbitrary n. 
Hence, the expected number of halos containing n or more 
galaxies brighter than M r = —19.9 that are in cells of vol- 
ume V containing total mass M is 

N s = / dmN(m,S c \M,V)f n (m), (25) 

where M min = 1O 12 /i _1 M and N(m,5 c \M, V) is the same 
quantity as before (c.f. equation[3]) , but with the new value of 
V. Indeed, the only significant difference from equation 
is that we have now included a factor of f n (m) to account for 
the fact that only a fraction of halos of mass m are expected 
to be in the group catalog. Note that this factor does not 
depend on M or V, because the large scale environment does 
not affect equation (|23[) . 

With this expression for Ns m hand, we can now use 
equation ((5]l along with the observed number N of groups 
having n or more galaxies, and our estimate of the total 
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volume V of the Great Wall to estimate its mass M. For 
as = 0.8, the rms fluctuation on scale V in linear theory 
is ox = 0.142. As before (equation [7]), we assume a Pois- 
son distribution for the number of groups, but now with 
mean given by equation (|25|l An important check on our 
approach is to perform this analysis for a range of values of 
n: the inferred mass distribution should not be sensitive to 
this choice. For n = (3, 4, 5, 6, 7, 8, 9, 10) the observed num- 
ber of groups is Agroups = (335, 199, 132, 96, 75, 68, 60, 49) 
when the link-length is 8h~ Mpc. For the longer link-length 
12/i _1 Mpc, A groups = (645, 361, 219, 155, 117, 96, 84, 69). 

The curves in the top panel of Figure [7] show a num- 
ber of estimates of the mass of the Great Wall, when the 
link-length is 8/i _1 Mpc and as = 0.8. The dotted curve, 
which is shifted towards larger masses than any of the other 
curves is for n = 3. This offset may be due to the difficulties 
associated with identifying small groups. For n > 5, the dis- 
tributions overlap: we have shown n = 6, 7 and 8. This is a 
nontrivial self-consistency test of our method. However, at 
7i > 10 (not shown) the distributions shift further towards 
smaller masses; it may be that here we are in the regime 
of small number statistics, where the number of groups con- 
tributing to the estimate has dropped below 50, so that Pois- 
son errors on A groU p are more than 10% of A group . 

These curves suggest that the total mass in the Wall is 
about 1O 16 - 77 /i _1 M , meaning that the structure is about 
3.55 times denser than the background. This in equation Q 
gives the associated linear theory density Sl ■ In terms of the 
linear theory rms on this scale, we find <5l/ox = 4.2. Using 
ol(M) instead makes this 6.6. The overdensity in halos de- 
pends on n; it has 10 times the expected number of halos 
when n = 9, but 9 times the expected mean number when 
n = 7. This is consistent with the fact that dense regions are 
expected to be overabundant in massive halos, and increas- 
ing n removes lower mass halos. The associated mass frac- 
tion in the observed groups (equation [9} varies from about 
40% for n = 4 to about 30% for n = 9. 

The corresponding results when the Wall is defined by 
the longer link-length are shown in the bottom panel. In this 
case, the total mass in the Wall is M = 10 171 /i _1 A/q, so it 
is 2.25 times the background mass density, making <5l/o"l = 
4.6; using ctl(M) instead makes this 6.3. The overdensity in 
halos is about 5, and the observed groups account for about 
20 percent of the total mass. This smaller mass fraction is a 
direct consequence of defining the Wall as a looser structure. 



5.3 Extreme value statistics 

The dashed curves in the two panels show the estimate of the 
mass associated with the extreme value statistics argument 
of Section[3] This estimate requires as input the total survey 
volume, which we have set equal to the total comoving vol- 
ume within z — 0.2, making Vsurvey/VWaii = 3456 and 1100 
for the two (short and long) linking lengths. In contrast to 



4 This follows from the Poisson assumption for halo counts in 
cells (M, V), the fact that a random subsample of a Poisson dis- 
tribution is Poisson, and because the distribution of the sum of 
Poisson distributed numbers is Poisson with mean given by the 
sum of the means of the individual distributions. 
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Figure 7. Comparison of the excursion set estimate of the mass 
of the Sloan Great Wall (solid) with the expected mass returned 
from the extreme value statistics approach (dashed) if erg = 0.8. 
Top and bottom panels show results when the Wall and its mem- 
bers are defined using link-lengths of 8h _1 Mpc and 12/i — 1 Mpc, 
respectively. Different solid curves in each panel show the ex- 
cursion set results for groups having more than n = 6,7, and 8 
members; the dotted curve is for n = 3. The excursion set mass 
estimate shifts to lower masses as n increases, although it is quite 
stable around n = 7; it is significantly larger than the estimate 
from extreme value statistics. 



when we performed this analysis for the Shapley superclus- 
ter, the dashed curve now lies to the left of the solid curves: 
the excursion set estimates of the mass significantly exceed 
those expected based on extreme value statistics. This means 
that, if the excursion set estimates are reliable, then the ex- 
istence of the Wall is difficult to reconcile with the standard 
model. 
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Figure 8. Similar to previous figure, but now erg = 0.9. The 
left-most long dashed curve shows the extreme value result for 
the shorter (8/i _1 Mpc) link-length, the short dashed curve just 
to the right of it shows the result of increasing Vwall by 30% , to 
approximately account for z-space effects. The solid and dotted 
curves to the right of this curve show the corresponding excursion 
set estimates (we only show the n = 7 result). The next set of 
long- and short-dashed, solid, and dotted curves show these same 
quantities when the Wall is defined by the longer (l2h~ 1 Mpc) 
link-length. 



Increasing erg alleviates the discrepancy slightly, as Fig- 
ure [8] illustrates (solid and long-dashed curves). If erg = 0.9 
and n = 7, then the excursion set analysis of the struc- 
ture defined by the 8h~ Mpc link length estimates a mass 
overdensity of 3.7, a halo overdensity of 7.7, <5l/o"l = 3.8 
and &l/<tl(M) = 6.2. These numbers are 2.2, 4, 4.04 and 
5.5 when the link length is 12ft _1 Mpc (Table [2]). For ei- 
ther structure, these are significantly larger than the ex- 
treme value estimate of the expected mass of the densest 
object. 

The second set of curves associated with each estimate 
(short-dashed and dotted lines) show the result of account- 
ing crudely for redshift-space effects by increasing the Wall 
volume by 30%. To first order, increasing the volume in- 
creases all the mass estimates, but does not change the dis- 
crepancy between the extreme value and excursion set esti- 
mates. This is the basis for our claim earlier that accounting 
for z-space distortions does not change our conclusions. A 
more careful look shows that, the extreme value and excur- 
sion set mass estimates shift upwards by slightly different 
amounts: about 0.1 and 0.05 dex, respectively. As a result, 
although the peaks are still quite well-separated, the tails of 
the mass estimates overlap slightly more. This means that 
the tension between excursion set and extreme value masses 
is alleviated somewhat, particularly for the 12/i -1 Mpc link- 
length. 

Thus, however we define it, the Wall is substantially 



more massive compared to the expected mass of the densest 
of Vsurvey/Vwaii randomly placed cells. This can be appre- 
ciated directly from the fact that the excursion set analyses 
returned estimates of 8l /ox ~ 4 for the Wall, compared to 
« 2 for Shapley (for erg = 0.8), even though Vsurvey/Vwail is 
not much larger than (200/31) 3 . 

It is interesting, therefore, to ask if its mass is also diffi- 
cult to reconcile with the peaks model of Section \4. 31 which 
attempts to account for the fact that the Wall is not just a 
randomly placed cell. In this case, an object with the mass 
and volume of the Wall would not be unusual only if it is 
the largest structure within a few times 10 8 Vwaii; i-e., essen- 
tially within the Hubble volumeQ Expressed another way, if 
<rg = 0.8 then the expected mass of the most extreme peak 
within z = 0.2 is 10 16 57 or 10 16 ' 95 for our two definitions 
of the Wall. Although these are slightly larger than the ran- 
domly placed cells estimate, they are significantly smaller 
than the excursion set estimate. 



6 DISCUSSION AND AN EXTENSION 

We discussed a number of methods for estimating the masses 
of extreme objects in the Universe, and applied them to two 
of the most dramatic objects in the local Universe: the Shap- 
ley supercluster and the Sloan Great Wall. We used a perco- 
lation analysis to define these systems, and illustrated how 
our results depended on the link-length (8 or 12/i _1 Mpc) 
used to define it. 

In the case of Shapley, our estimate of the mass comes 
from combining estimates of the masses of its constituents 
with an excursion set analysis of the depedence of the halo 
mass function on the density of the local environment. Un- 
fortunately, this was not possible in the case of the Wall, 
since mass estimates of its constituents are not available. 
In this case, we combined the excursion set analysis with a 
Halo-Model interpretation of its constituent groups, them- 
selves identified from (optical) SDSS redshift survey data. 
Unfortunately, this method cannot currently be applied to 
Shapley, since it lies outside the SDSS footprint. This is also 
why we have not included results from the recent analyses 
of the Wall by Einasto et al. (2010, 2011) - but we hope to 
do so soon. 

We compared these mass estimates with that expected 
for the densest object in an appropriately defined 'local' uni- 
verse, and argued that the existence of Shapley is easily ex- 
plained by currently popular models of structure formation 
(Figures E]and |U); its mass (1.82 x W 16 h~ 1 M Q ) is consis- 
tent with it being the most massive object of its volume 
(1.25 x 10 5 /i _3 Mpc 3 ) within 200h~ 1 Mpc. 

On the other hand, the Sloan Great Wall (Figure [BJ is 
difficult to explain, especially if the amplitude of the initial 
fluctuation field was at the low end of currently accepted 
values (Figures [7] and [8]). Its mass (5.9, 12.6) x 10 16 /i _1 M Q 
is larger than expected for the most massive object of its 
volume (2.3,7.2) x 10 5 /M 3 Mpc 3 within z = 0.2 (where the 

5 We used 5l/<tl(M) ~ 6.5 rather than <5l/<tl ~ 4 to make this 
estimate. The Lognormal estimate of the effective peak height, 
5.9, is not very different. Figure \E\ shows that large JV e ff, and 
hence large volumes, are required to see even one peak of this 
height. 
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Table 2. Estimated initial fluctuation height, mass overdensity, galaxy overdensity and mass of the SDSS Great Wall. The two upper 
rows refer to the 8/i _1 Mpc link length; the two lower rows to I2h~ 1 Mpc. 



two numbers are for defining the Wall using link-lengths of 
8 or 12/i _1 Mpc respectively). If ag — 0.8, then insertion of 
the excursion set estimate of its mass in our extreme value 
statistics calculation suggests that it must be the densest 
object of its volume within the Hubble volume. An analy- 
sis which combines the excursion set estimate of the initial 
overdensity associated with the Wall, S/a ~ 6, with the as- 
sumption that this fluctuation was the largest peak in the 
initial conditions, leads to a similar conclusion (Figure [5]). 

We are hesitant to make strong statements about 
whether this makes the Great Wall inconsistent with Gaus- 
sian initial conditions with acceptable values of as, primar- 
ily because our current numbers are based on assuming the 
Wall is spherically symmetric when it clearly is not. For this 
reason, we are in the process of extending both our methods 

- the excursion set and extreme value statistics analyses 

- to account for this. Here we are aided by the fact that 
the Wall itself is not virialized. Hence, we can use the sim- 
ple parametrization of triaxial collapse from Lam & Sheth 
(2008) to generalize equation Q for the mapping between 
nonlinear and linear overdensity. This can then be used in 
our excursion set analysis. With this estimate of initial over- 
density and shape in hand, we can modify our extreme value 
statistics calculation by replacing the number density of ini- 
tial density of peaks of specified scale and height by adding 
the constraint that comes from specifying the shape (e.g., 
iBardeen et al. 1 986). This is the subject of work in progress. 

Our results suggest that the Sloan Great Wall is about 5 
times the volume and about the same factor times the mass 
of the Shapley supercluster (we have used the larger mass 
and volume estimates of the Wall). So one might wonder if 
Shapley is about the sixth most extreme object of its volume 
within z — 0.2. It is straightforward to extend our applica- 
tion of extreme value statistics to address this question. In 
particular, the same logic which leads to equation im- 
plies that the expected distribution of the mass of the nth 
densest region is 

p n {M\V) « rJJ np(M\V) [l-p(<M|y)]" _1 

x p(< M\V) N ~" (26) 

(e.g. iGumbel 19661 ). The dotted curve in Figure shows this 
expression, evaluated with n — 6, N — 6375, and <jl = 0.24. 
This shows that Shapley could easily be the sixth most mas- 
sive object within z — 0.2 if ug = 0.8. Of course, it is trivial 
to extend this to our extreme value treatment of peaks: one 
simply replaces p(< M\V) — > exp(— n p k(> t'JVsurvey). The 
Appendix discusses how to modify this approach to account 
for the clustering of peaks. 

Similarly, one can write down expressions for the joint 



probability distribution of the masses of e.g., Shapley and 
the Great Wall, if we require one to be the ith and the other 
the jih most extreme object of its type (recall they may 
have different values of <jl) in the same survey volume - 
although we have not reproduced them here. 

One of the surprises of these analyses is, perhaps, the 
precision of the mass estimates it returns: typically, these 
are of order 15%, both for the excursion set and the ex- 
treme value statistics approaches. Although we provided 
some analysis for why this is so fequation ll3|) . it would have 
been nice to test our mass estimates by combining the mo- 
tions of the clusters in these systems with an infall model. 
However, because the Shapley supercluster and the Sloan 
Great Wall are both far from round (e.g. Section [2]), esti- 
mates based on the spherical collapse model are inappropri- 
ate. Therefore, we are currently in the process of developing 
an infall model based on the assumption of a triaxial col- 
lapse. 

The precision of the mass estimates derives from the 
fact that the extreme fluctuations we are considering are 
from Gaussian random fields, in which extreme fluctuations 
are rare, so the distribution of events on the tail will be 
similar to one another. However, it is almost certain that, 
at least for the extreme value statistics calculation, this is 
more generic. This is because a large class of initial distribu- 
tions have, as their limiting extreme value statistic, a dou ble- 
exponential form (|Fisher fc Tippet 19281 : iGumbel 19li6l ). In 
the astrophysical context, this Fisher- Tippet or Gumbel dis- 
tribution, and the study of extreme value statistics in gen- 
eral, has a l ong history in the study of the bright est galaxies 
in clusters (|Scott 19571 : iBhavsar fc Barrow 19851 ). Our work 
suggests that extreme value statistics may continue to pro- 
vide insight into the study of the largest structures in the 
Universe. 

In particular, it would be interesting to use this ap- 
proach to see if the sizes of the largest voids, or the masses 
of the most massive clusters or superclusters (e.g. Lupar- 
ello et al. 2011; Schirmer et al. 2011; Yaryura et al. 2011), 
are consistent with the hypothesis that the initial fluctuation 
field was Gaussian. To use our approach for more generic ini- 
tial conditions, one must know how the halo mass function 
depends on the large scale environment and one must have 
a model for the nonlinear probability distribution function. 
For non-Gaussian initial conditions of the local type, suc h 
models have recently become available (|Lam fc Sheth 2 009). 
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APPENDIX A: ON THE APPROXIMATION OF 
INDEPENDENT CELLS WHEN CALCULATING 
EXTREME VALUES OF SPATIAL STATISTICS 

The calculation of extreme value statistics reduces to one of 
writing the probability that, of n draws from a distribution, 
none are above a certain value. This raises the question of 
whether or not the draws can be assumed to be independent 
picks. For the spatial statistics we are considering here, in 
which each cell represents a pick, and the total volume is 
the sum of the cells, the answer is clearly 'no' because there 
are correlations between the cells. On the other hand, since 
the correlations decrease with cell separation, most cells will 
only be strongly correlated with a few nearby cells. More- 
over, since we will generally be interested in large cells, even 
nearby cells are likely to be only weakly correlated. So the 
assumption of independence, may in fact be quite good. The 
question is: Are extreme value statistics likely to be distorted 
by even these weak correlations? After all, the whole point 
of such stastistics is that they are sensitive to the tails of 
the distribution, and these are where (fractional) changes to 
the distribution will be largest. In what follows, we quantify 
this effect. 

To proceed, we need an expression for the joint distribu- 
tion of n-draws. We will first use a multivariate Gaussian to 
illustrate the argument, and then discuss possible general- 
izations. If Si denotes the value of the field at position i, then 
the multivariate Gaussian distribution is specified by the co- 
variance matrix C, the elements of which are dj — (SiSj) 
(we are assuming (Si) — for all i). In our case, Cy will be 
a function of the separation r between cells i and j. Namely, 



Cii = 



<Tii(0)<Tjj(0) 



dk 

T 



k 3 P(k) 
2tt 2 



where 



W(kRi)W(kRj) 



(Al) 



sin(fcr) 
kr 



and we have allowed for the fact that the cells of interest at 
position i may have a different size than those at position 
j. In the main text we were primarily interested in the case 
Ri = Rj. If the Ri are large, and/or the separation between 
cells is large, then C will be close to diagonal, so the n-point 
distribution will be well-approximated by the product of n 
1-point distribution functions. As a result, 

J— oo J — OO j J — oo 

(A2) 

This is the approximation used in equation (|10p of the 
main text. The leading order correction to this can be ob- 
tained by writing this in terms of integrals above 8 C , and 
then using previous results for high peaks o r dense patches 
jBardeen et al. 19861 ; 15 ensen fc Szalav 198 6) to evaluate the 
result, which shows that the expression gets a correction fac- 
tor which, to lowest order, depends on the two-point corre- 
lation function of regions above S c . 

In practice, the present day 1-point distribution func- 
tion is no longer Gaussian. However, on large scales, it may 
be a good approximation to assume that there is a mono- 
tonic mapping between the nonlinear overdensity and the 
linear one. E.g., the main text assumes that this mapping is 
well approximated by a lognormal. If one assumes that this 
is also true of the n-point distribution function, then we have 
a fully specified model of the nonlinear n-point function, ex- 
pressed in terms of the initial Gaussian covariance matrix. 
Now, the extreme value statistics care about the cumulative 
distribution: the monotonicity of the mapping means that 
the net effect of nonlinear evolution is simply to shift the 
threshold of the corresponding multivariate (linear theory) 
Gaussian. Once this shift has been applied, then the previ- 
ous analysis of the Gaussian case goes through in its entirety. 
This justifies our use of equation (JTDJ, and also shows how 
it might be improved. 
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Al Including the clustering of extrema 



Equation (|21|l in the main text follows from the assumption 
that peaks are uncorrelated, so the probability that there 
are no peaks in V auTVCy is given by the Poisson expression 
exp(— npkVsurvoy). This can be derived from equation (1A2[) . 
by taking the limit of infinite sampling (in which n — > oo, 
so the typical spacing between the cells is no longer of or- 
der their size). Going beyond the Poisson model requires a 
calculation of the higher order correlation functions (White 
1979) . These are only known approximately (Appendix F in 
Bardeen et al. 1986). On large scales where these are small, 
the required replacement in equation (|21[l is 



2 Cpk 



«pk(> v)Vs urvcy 

-> n pk (> v)Vs urvcy [i P k(> v)Vs urvcyj , 
where, for high peaks on large scales, 

[™ P k(> v) Vs urvcy ] £pk W [n pk (> ^)fopk(> Z^Vsurvcy] C 



2 , Q s exp(-i/72) 



2tt 



2 _2 



O"o(-Rsurvoy) 



(A3) 



Including this extra term affects the distributions shown in 
Figure \5\ for N e g < 10 3 or so (the peak shifts to slightly 
larger v) but matters little for larger N e s. 



